Examining New York City Crash Data

Emmanuel Yankson

Inspiration & Previous Works

  • New York City is known to be a very chaotic place, especially when it comes to the traffic. By being able to study the data and looking for where the most accidents happen, we can make navigating NYC safer for everyone.
  • The work done by those who partipated in the 2022 Data jamboree laid some of the foundation for the model used.

Viewing the Data

:::{.r-fit-text}

  • We can take a chunk of our data and use google maps to properly examine where these accidents are taking place.
CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME NUMBER OF PERSONS INJURED NUMBER OF PERSONS KILLED NUMBER OF PEDESTRIANS INJURED NUMBER OF PEDESTRIANS KILLED NUMBER OF CYCLIST INJURED NUMBER OF CYCLIST KILLED NUMBER OF MOTORIST INJURED NUMBER OF MOTORIST KILLED CONTRIBUTING FACTOR VEHICLE 1 CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
0 01/01/2018 4:16 MANHATTAN 10025 40.801800 -73.961080 (40.8018, -73.96108) CATHEDRAL PARKWAY MORNINGSIDE DRIVE None 1.0 0.0 0 0 0 0 1 0 Other Vehicular Following Too Closely None None None 3820157 Sedan Sedan None None None
1 01/01/2018 20:30 QUEENS 11373 40.743973 -73.885100 (40.743973, -73.8851) BROADWAY BAXTER AVENUE None 0.0 0.0 0 0 0 0 0 0 Unsafe Speed Failure to Yield Right-of-Way None None None 3818846 Sedan Sedan None None None
2 01/01/2018 15:30 MANHATTAN 10025 40.801740 -73.964770 (40.80174, -73.96477) WEST 108 STREET AMSTERDAM AVENUE None 0.0 0.0 0 0 0 0 0 0 Driver Inattention/Distraction Unspecified None None None 3818947 Sedan None None None None
3 01/01/2018 12:10 QUEENS 11354 40.763073 -73.816345 (40.763073, -73.816345) None None 40-07 149 STREET 0.0 0.0 0 0 0 0 0 0 Backing Unsafely Unspecified None None None 3820645 Station Wagon/Sport Utility Vehicle Station Wagon/Sport Utility Vehicle None None None
4 01/01/2018 18:35 BRONX 10459 40.820305 -73.890830 (40.820305, -73.89083) BRUCKNER BOULEVARD HUNTS POINT AVENUE None 0.0 0.0 0 0 0 0 0 0 Unspecified Unspecified None None None 3819261 Station Wagon/Sport Utility Vehicle Sedan None None None
Make this Notebook Trusted to load map: File -> Trust Notebook
  • Now that we’ve examined the mapping, let’s look at some hotspots that appear in our data.

Examining Hotspots

  • We can go even further and examine each incident by the type it was, namely if it was just a pedestrian accident, cyclist accident, or vehicle accident.

  • Checking out the table and graph outputted:

               Cyclists Injured  Pedestrians Injured  Motorists Injured
MANHATTAN                  6194                 6827               9729
QUEENS                     4155                 8373              26527
BRONX                      2432                 5609              15521
BROOKLYN                   8345                11261              30264
STATEN ISLAND               264                  856               4002

  • Brooklyn appears to be the borough with the highest amount of crashes, so let’s focus more closely on this borough.

Focusing on Brooklyn Borough

CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME NUMBER OF PERSONS INJURED NUMBER OF PERSONS KILLED NUMBER OF PEDESTRIANS INJURED NUMBER OF PEDESTRIANS KILLED NUMBER OF CYCLIST INJURED NUMBER OF CYCLIST KILLED NUMBER OF MOTORIST INJURED NUMBER OF MOTORIST KILLED CONTRIBUTING FACTOR VEHICLE 1 CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
5 01/01/2018 13:50 BROOKLYN 11207 40.658920 -73.889824 (40.65892, -73.889824) NEW JERSEY AVENUE LINDEN BOULEVARD None 0.0 0.0 0 0 0 0 0 0 Unspecified Unspecified None None None 3820853 Sedan None None None None
6 01/01/2018 1:37 BROOKLYN 11212 40.662277 -73.910780 (40.662277, -73.91078) LIVONIA AVENUE BRISTOL STREET None 0.0 0.0 0 0 0 0 0 0 Backing Unsafely Unspecified None None None 3819256 Sedan Sedan None None None
10 01/01/2018 5:00 BROOKLYN 11229 40.604576 -73.938220 (40.604576, -73.93822) None None 2024 GERRITSEN AVENUE 0.0 0.0 0 0 0 0 0 0 Driver Inattention/Distraction Unspecified None None None 3820673 Station Wagon/Sport Utility Vehicle Station Wagon/Sport Utility Vehicle None None None
14 01/01/2018 22:40 BROOKLYN 11212 40.661907 -73.927574 (40.661907, -73.927574) RUTLAND ROAD EAST 92 STREET None 0.0 0.0 0 0 0 0 0 0 Passing or Lane Usage Improper Unspecified None None None 3821039 Sedan Sedan None None None
15 01/01/2018 2:30 BROOKLYN 11237 40.699190 -73.914690 (40.69919, -73.91469) MYRTLE AVENUE IRVING AVENUE None 0.0 0.0 0 0 0 0 0 0 Pavement Slippery Unspecified None None None 3820937 Sedan Sedan None None None
  • For this disaster of a borough, let’s do some further visual analysis before we do some machine learning. When are most of the crashes occurring?

:::{.r-fit-text} - Runing a further with the zipcodes

  • Seems like zip 11207 is a real issue.

Examining Zip Code 11207 in Brooklyn, NYC

CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME NUMBER OF PERSONS INJURED NUMBER OF PERSONS KILLED NUMBER OF PEDESTRIANS INJURED NUMBER OF PEDESTRIANS KILLED NUMBER OF CYCLIST INJURED NUMBER OF CYCLIST KILLED NUMBER OF MOTORIST INJURED NUMBER OF MOTORIST KILLED CONTRIBUTING FACTOR VEHICLE 1 CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
5 01/01/2018 13:50 BROOKLYN 11207 40.658920 -73.889824 (40.65892, -73.889824) NEW JERSEY AVENUE LINDEN BOULEVARD None 0.0 0.0 0 0 0 0 0 0 Unspecified Unspecified None None None 3820853 Sedan None None None None
40 01/01/2018 7:51 BROOKLYN 11207 40.675632 -73.898780 (40.675632, -73.89878) ATLANTIC AVENUE GEORGIA AVENUE None 0.0 0.0 0 0 0 0 0 0 Unspecified Unspecified None None None 3820850 Sedan Station Wagon/Sport Utility Vehicle None None None
70 01/01/2018 18:20 BROOKLYN 11207 40.680664 -73.902626 (40.680664, -73.902626) CONWAY STREET BUSHWICK AVENUE None 1.0 0.0 0 0 0 0 1 0 Driver Inattention/Distraction Unspecified Unspecified None None 3820863 Station Wagon/Sport Utility Vehicle Sedan Sedan None None
109 01/01/2018 22:30 BROOKLYN 11207 40.659930 -73.891655 (40.65993, -73.891655) PENNSYLVANIA AVENUE HEGEMAN AVENUE None 0.0 0.0 0 0 0 0 0 0 Driver Inexperience Unspecified None None None 3820864 Station Wagon/Sport Utility Vehicle Sedan None None None
128 01/01/2018 21:48 BROOKLYN 11207 40.657753 -73.896120 (40.657753, -73.89612) LINDEN BOULEVARD WILLIAMS AVENUE None 0.0 0.0 0 0 0 0 0 0 Following Too Closely Unspecified None None None 3820865 Sedan Station Wagon/Sport Utility Vehicle None None None
  • For this disaster of a zipcode, let’s do some further visual analysis before we do some machine learning. When are most of the collisions occurring?

Economic Analysis of Zipcode

  • Now that we’ve seen the extent of the damage, can we examine some factors that may contribute to these injuries? We can get a street like view if we decide to get data like the median household income and such. With low household income, it might explain why the infastructure of the area is so bad.
Zipcode City Density Median Household Income Median Home Value Land Area
23932 11239 Brooklyn 23470.0 26275.0 354600.0 0.57
146065 11239 Brooklyn 23470.0 26275.0 354600.0 0.57
67858 11239 Brooklyn 23470.0 26275.0 354600.0 0.57
145960 11239 Brooklyn 23470.0 26275.0 354600.0 0.57
145964 11239 Brooklyn 23470.0 26275.0 354600.0 0.57
... ... ... ... ... ... ...
182255 11249 Brooklyn NaN NaN NaN NaN
182278 11249 Brooklyn NaN NaN NaN NaN
182345 11249 Brooklyn NaN NaN NaN NaN
182379 11249 Brooklyn NaN NaN NaN NaN
182497 11249 Brooklyn NaN NaN NaN NaN

182519 rows × 6 columns

  • The zipcode 11249 seems to be missing a lot of the data. 11249 is Williamsburg, which is actually one of the wealthier zipcodes in Brooklyn, so assuming that the data was present it would most likely top the list in Brooklyn. This is supported even further as looking at the previous visuals we see that 11249 has some of least amount of collisions in the hotspot that is Brooklyn.
Zipcode City Density Median Household Income Median Home Value Land Area
72839 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
72842 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
72981 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
182515 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
182518 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
... ... ... ... ... ... ...
182491 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
182498 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
182500 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
0 11207 Brooklyn 34965.0 32945.0 403800.0 2.67
182476 11207 Brooklyn 34965.0 32945.0 403800.0 2.67

12183 rows × 6 columns

  • However on the opposite end of the spectrum we have the problem zipcode of 11207 which actually happens to be the fourth lowest zipcode in terms of median household income. This seems to align with the idea that low economic data from a specific area infers that the infrastructure in that area is not well maintained, which could contribute to the accidents in the area.

  • How can we further analyze this?

Machine Learning with Linear Modeling

Mean Squared Error: 0.5057016099968228

Conclusion

  • Brooklyn, specifically the zipcode of 11207 sees the most amount of collisions
  • During the peak hours of 5pm is the most dangerous for navigating through NYC in terms of accidents and collisions
  • Economic data can be useful in inferring the volume of collisions in an area

References

  • Statistical Computing in Action 2022 - Data Jamboree https://asa-ssc.github.io/minisymp2022/jamboree/